Skip to content

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Jan 30, 2026

Android CoreCLR currently does not ship with CreateDump. When native crashes occur, it is difficult to figure out the underlying cause of the crash, as managed stacks aren't captured anywhere, and android's crash reporter doesn't understand runtime symbols when it generates a tombstone.

Until we have a solution for creating a dump on Android CoreCLR, this PR looks to log managed callstacks for the crashing thread whenever a dump would have been created. That way, there is some hint as to what may have caused the crash, despite not having the native callstack.

These managed callstacks will only be emitted during FailFast. Synchronous faults currently will not hit PROCCreateCrashDumpIfEnabled as Android registers default signal handlers that the runtime defer signal handling to. See #123735

Example ADB log

01-30 17:08:40.422 25992 26031 I DOTNET  : Hello, Android!
01-30 17:08:40.436 25992 26031 F DOTNET  : =================================================================
01-30 17:08:40.437 25992 26031 F DOTNET  : 	Managed Stacktrace:
01-30 17:08:40.437 25992 26031 F DOTNET  : =================================================================
01-30 17:08:40.452 25992 26031 E DOTNET  :    at System.Runtime.EH.DispatchEx(System.Runtime.StackFrameIterator ByRef, ExInfo ByRef)
01-30 17:08:40.453 25992 26031 E DOTNET  :    at Program.ThrowCatchExceptions(Int32)
01-30 17:08:40.453 25992 26031 E DOTNET  :    at Program.Main(System.String[])
01-30 17:08:40.453 25992 26031 F DOTNET  : =================================================================
01-30 17:08:40.453 25992 26031 F DOTNET  : Aborting process.

@mdh1418 mdh1418 requested a review from lateralusX January 30, 2026 23:27
@mdh1418 mdh1418 self-assigned this Jan 30, 2026
Copilot AI review requested due to automatic review settings January 30, 2026 23:27
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables Android CoreCLR to log managed stack traces when a native crash occurs, providing diagnostic information in environments where CreateDump is not available. The logging is wired through the existing crash-dump hook so that, whenever a dump would be created, the crashing managed thread’s stack is emitted to Android’s logging.

Changes:

  • Add a HOST_ANDROID helper in EEPolicy that walks and logs the current managed thread’s call stack using existing LogCallstackForLogWorker infrastructure.
  • Extend PROCCreateCrashDumpIfEnabled on Android to call this helper via a weak symbol and emit a formatted “Managed Stacktrace” section to minipal_log_write_fatal, followed by the existing abort message.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/coreclr/vm/eepolicy.cpp Introduces LogCallstackForAndroidNativeCrash, an extern "C" Android-only helper that obtains the current Thread and logs its managed call stack, to be invoked from the PAL crash-dump path.
src/coreclr/pal/src/thread/process.cpp On Android, adds a weak reference to LogCallstackForAndroidNativeCrash and, when present, logs a managed stacktrace banner and the managed stack in PROCCreateCrashDumpIfEnabled before logging that the process is aborting.

Comment on lines 2751 to 2755
minipal_log_write_fatal("\n=================================================================\n");
minipal_log_write_fatal("\tManaged Stacktrace:\n");
minipal_log_write_fatal("=================================================================\n");
LogCallstackForAndroidNativeCrash();
minipal_log_write_fatal("=================================================================\n");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lateralusX I briefly looked into emitting the callstacks in the same manner that CrashInfo.cs does it. It seemed like its format was designed for managed exceptions, and I don't know if it'd be useful to stub in fields like the type, hr, message, etc. until we know what tools reading the adb log would expect.

If we did want to format it a particular way, it looed liked we'd be adding another API to CallStackLogger, for what seems like a temporary solution until Android CoreCLR can create a dump.

Copy link
Member

@jkotas jkotas Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not have to stub anything in the format produced by CrashInfo.cs.

We have intentionally used json for this format. json makes it straightforward to omit fields that are not relevant or available in the given situation, and also add additional information specific for the given situation. We have number of optional fields like that, for example

if (moduleBase != nint.Zero)
{
if (!WriteHexValue("module"u8, (nuint)moduleBase))
return false;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to, or we dont need to adopt the same format?

We were considering using the same format since NativeAOT uses it in some scenarios. If we end up generating a dump on Android CoreCLR, it would be nice if it could automatically work with !crashinfo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to, or we dont need to adopt the same format?

Sorry, I meant to "We do not nave to stub..."

We were considering using the same format since NativeAOT uses it in some scenarios.

Yes, it would be great to have on json format for crash reports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to store this json file in some directory that the apps have write access to by default? Is there a directory like that on Android?

I do not think we want to be printing it to Android console (by default) since it tends to be many pages of text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should generate a crash report in the same json format as used by NAOT and crashdumps --crashreport and --crashreportonly. It needs to run in-proc, so the data we put into the crash report needs to be retrieved on best effort and implementation should be aware that it can run from a signal handler. It should follow the same env as we do on other platforms (but only supporting the arguments making sense for crash reports).

Question is if shouldn't be on by default? Reason why it shouldn't be on by default is that the app needs additional service to get the file of device and if nothing like that is used, we could fill up the devices with crash reports and app users needs to explicitly clean them up on an app-by-app basis or by the app on relaunch. If an app has additional infrastructure/services to upload managed crash reports, then they would enable crash reports support.

Each app can have private files, it can have a private cache folder (nuked by the system), it can use external storage as well as using shared storage with other apps. An app can't write into the crash folders used by the system crash daemon.

I think that an app should opt-in to managed crash reporting, give runtime the path to the location of the crash report + additional configuration through existing env crash dump env variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could fill up the devices with crash reports and app users needs to explicitly clean them up on an app-by-app basis or by the app on relaunch.

If is on by default, we need to make sure to store the last crash reports only (or only a few crash reports). Overwrite or delete older crash reports as new ones are created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if we decide to enable full crash report by default on Android we need to constrain amount of crash reports, since it will be in text format we could also consider storing it in compressed format to reduce disk footprint even more.

@mdh1418 mdh1418 force-pushed the android_coreclr_native_crash_log_managed_callstack branch from 3d0912d to ffb92cc Compare January 30, 2026 23:40
// TODO: Dump all managed threads callstacks into logcat and/or file?
if (LogCallstackForAndroidNativeCrash != nullptr)
{
minipal_log_write_fatal("\n=================================================================\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the ======================== separators? We do not use a separators like that anywhere else in CoreCLR.

Copy link
Member Author

@mdh1418 mdh1418 Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they're not needed, I had initially just used the format that Android Mono was using

g_async_safe_printf ("\n=================================================================\n");
g_async_safe_printf ("\tManaged Stacktrace:\n");
g_async_safe_printf ("=================================================================\n");
mono_walk_stack_full (print_stack_frame_signal_safe, mctx, jit_tls, mono_get_lmf (), MONO_UNWIND_LOOKUP_IL_OFFSET | MONO_UNWIND_SIGNAL_SAFE, NULL);
g_async_safe_printf ("=================================================================\n");

@jkotas
Copy link
Member

jkotas commented Jan 31, 2026

What is the experience for unhandled managed exceptions or fail fasts with this fix? We should make sure that the stacktrace is logged just once.

@jkotas
Copy link
Member

jkotas commented Jan 31, 2026

Example ADB log

What was the source code or runtime change for this example?

if (LogCallstackForAndroidNativeCrash != nullptr)
{
minipal_log_write_fatal("\n=================================================================\n");
minipal_log_write_fatal("\tManaged Stacktrace:\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should something more self-explanatory than just "Managed stacktrace:". Maybe something like "Crash in native code".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought PROCCreateCrashDumpIfEnabled also gets hit for unhandled managed exceptions. Would it still be fine to classify that as crash in native code? Maybe also Crash report: would align more given the hope is this would be replaced by real create dump logic.

@vitek-karas
Copy link
Member

Just curious - would we also consider printing out full mixed callstack (native/managed)? The OS stackwalker will only be able to walk the first native section of the stack, once it hits managed code it will stop. So if there's native code above the managed code which would be interesting, we will have no way to learn about it. We could possibly print out just native addresses for the native portion, and rely on symbolication later.

@mdh1418
Copy link
Member Author

mdh1418 commented Jan 31, 2026

What was the source code or runtime change for this example?

I'm modifying the Android.Device_Emulator.JIT.Test Functional test as that was the easiest way to apply runtime changes.
I've been using the following forced native crashes:

  • FailFast via RaiseFailFastException placed in eventpipe and using the in-proc native EventListener
  • Segfault via *(volatile int*)0 = 0x1234; in eventpipe and using the in-proc native EventListener
  • Segfault via P/Invoke memset(nuull, 0, 1); in managed
  • Unhandled managed exception

That particular log in the description is the RaiseFailFastException in ep_buffer_write_event.

This log is from unhandled managed exception (haven't yet changed the format yet from discussions in above threads)

01-31 14:45:41.891  1346  1372 E DOTNET  : Unhandled exception. System.Exception: MIHW
01-31 14:45:41.891  1346  1372 E DOTNET  :    at Program.ForceNativeSegv()
01-31 14:45:41.891  1346  1372 E DOTNET  :    at Program.Main(String[] args)
01-31 14:45:41.892  1346  1372 F DOTNET  : =================================================================
01-31 14:45:41.892  1346  1372 F DOTNET  : 	Managed Stacktrace:
01-31 14:45:41.892  1346  1372 F DOTNET  : =================================================================
01-31 14:45:41.893  1346  1372 E DOTNET  :    at Program.ForceNativeSegv()
01-31 14:45:41.893  1346  1372 E DOTNET  :    at Program.Main(System.String[])
01-31 14:45:41.893  1346  1372 F DOTNET  : =================================================================
01-31 14:45:41.893  1346  1372 F DOTNET  : Aborting process.

@jkotas
Copy link
Member

jkotas commented Feb 2, 2026

Here are my thoughts on this:

  • For unhandled exceptions and fail fasts, the default experience for what you see in the Android console should match what you see on stderr on Windows or Linux. The error message and stacktrace (printed exactly once).

  • For crashes in unmanaged code, the managed stacktrace is better than nothing but it will rarely help with diagnosing the problem. For example, I think it is impossible to connect the dots from the managed stacktrace to the crash in eventpipe in the test you have shared. I expect the Android thumbstone to be a lot more useful for these, yes - it needs to be symbolicated offline. I am not sure whether it is a good idea to print the managed stacktrace to the console in this case since I will likely be misleading most of the time. If we want to do it anyway, it needs to come with a good description of what it is to avoid confusion.

@lateralusX
Copy link
Member

lateralusX commented Feb 2, 2026

Just curious - would we also consider printing out full mixed callstack (native/managed)? The OS stackwalker will only be able to walk the first native section of the stack, once it hits managed code it will stop. So if there's native code above the managed code which would be interesting, we will have no way to learn about it. We could possibly print out just native addresses for the native portion, and rely on symbolication later.

This would be my next suggestion. extend our current stack walker to handler mixed native/managed frames and output symbolicated managed frames, but native frames will probably follow the same format as we get in the tombstone, address + (module + build id) in text. I think this will be beneficial, not just for Android, and it will make it simpler than combining native crash data from tombstone with the managed stack trace in logcat. If we include native frames in our internal stack walker, we also have some more capabilities to handle scenarios that the native stack-walker would have issues handle. On Android we could probably go one additional step on higher API level and use Androids libunwind and cover both native + java frames when stackwalking IP's falling outside dotnet managed code.

Do we all agree it would make sense to extend our runtime stack-walker to have an option to produce mixed stack traces, only used for scenarios like this so shouldn't impact other stack walking usage, unless enabled.

@lateralusX
Copy link
Member

lateralusX commented Feb 2, 2026

What is the experience for unhandled managed exceptions or fail fasts with this fix? We should make sure that the stacktrace is logged just once.

Agree, we should only log the stacktrace for the faulting thread once, since it will go into logcat that is a limited resource. Question is if we should do it on the call site instead of inside PROCCreateCrashDumpIfEnabled and focus that function to produce the json crash report on Android. In the scenario of unhandled managed exception and fail fast the managed stack trace is logged on call site before triggering call ending up in PROCCreateCrashDumpIfEnabled. For native crashes, that would mean logging this information as part of the signal handler, alternative is to change the logic for existing unmanaged exception and fail fast and make sure all scenarios (including signal handler) end up calling through the same function for logging the managed stack and then potentially create the crash dump when enabled. Feels like doing this change will make things a little cleaner and we centralize the logging in one function.

@lateralusX
Copy link
Member

lateralusX commented Feb 2, 2026

Here are my thoughts on this:

  • For unhandled exceptions and fail fasts, the default experience for what you see in the Android console should match what you see on stderr on Windows or Linux. The error message and stacktrace (printed exactly once).
  • For crashes in unmanaged code, the managed stacktrace is better than nothing but it will rarely help with diagnosing the problem. For example, I think it is impossible to connect the dots from the managed stacktrace to the crash in eventpipe in the test you have shared. I expect the Android thumbstone to be a lot more useful for these, yes - it needs to be symbolicated offline. I am not sure whether it is a good idea to print the managed stacktrace to the console in this case since I will likely be misleading most of the time. If we want to do it anyway, it needs to come with a good description of what it is to avoid confusion.

I think we should start with the managed stacktrace when hitting unhandled HW exceptions + tombstone and it should give you enough information and available in logcat or crash reports submitted to the play console. Next step is probably to do a mixed stacktraces since we will get all information in one place and handle more scenarios than what the native stack walker can handle, will stop on first managed frame. On Mono we do try to dump the native stacktrace followed by a managed stacktrace, but the native stacktrace has been disabled on Android, so Mono currently only dumps the managed stacktrace, then the Android crash daemon will generate the native crash report.

Maybe we could put a marker frame on top top frame of the managed stacktrace in case of crash in native code? Then we could do the same for transition in/out of runtime code through the stackwalk and when we have a mixed mode stackwalker, we can just replace those parts with the real native stack frames, something like this:

[External Code]
Program.CallSomeNativeCodeThatCrash()
Program.Main(System.String[])

@jkotas
Copy link
Member

jkotas commented Feb 2, 2026

Do we all agree it would make sense to extend our runtime stack-walker to have an option to produce mixed stack traces

This should be a diagnostic specific stackwalker like what's in createdump today. I do not think we do not want to be teaching the runtime managed stack-walker that is used by the GC, EH, etc. about walking native frames for diagnostics.

if we should do it on the call site instead of inside PROCCreateCrashDumpIfEnabled

Yes. PROCCreateCrashDumpIfEnabled should keep doing what its name says. It should not print stacktraces to console.

[External Code]

Visual Studio stack window uses the same marked to indicate managed frames in managed libraries outside your project.

image

Something similar that's more unique like [Native Code] would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants